A Comparison of Two Variant Corpora: The Same Content with Different Source

نویسندگان

  • Kyonghee Paik
  • Kiyonori Ohtake
  • Kazuhide Yamamoto
چکیده

Abstract In order to investigate the effect of source language on translations, we investigate two variants of a Korean translation corpus. The first variant consists of Korean translations of 162,308 Japanese sentences from the ATR BTEC (Basic Expression Text Corpus). The second variant was made by translating the English translations of the Japanese sentences into Korean. We show that the source language text has a large influence on the target text. Even after normalizing orthographic differences, fewer than 8.3% of the sentences in the two variants were identical. We describe in general which phenomena differ and then discuss how our analysis can be used in natural language processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Variant Corpora for Machine Translation

This paper proposes the usage of variant corpora, i.e., parallel text corpora that are equal in meaning but use different ways to express content, in order to improve corpus-based machine translation. The usage of multiple training corpora of the same content with different sources results in variant models that focus on specific linguistic phenomena covered by the respective corpus. The propos...

متن کامل

Comparison of the Obtained Results from Two Calibration Methods in MRI Brachytherapy Normoxic Polymer Gel Dosimetry

Introduction: Despite the fact that the clinical implementation of polymer gel dosimetry has been facilitated after the introduction of normoxic gels, there are still complications in its clinical routine use that are under investigation. In the current work, the feasibility of using a normoxic polymer gel dosimeter named MAGICA has been investigated for use in our clinical brachytherapy applic...

متن کامل

Gender-oriented Commonalities among Canadian and Iranian Englishes: An Analysis of Yes/No Question Variants

This study investigatesvariability in English yes/no questions as well as the commonalities among yes/no question variants produced by members of two different varieties of English: Canadian English native speakers and Iranian EFL learners.Further, it probes the role of gender in theEnglish yes/no question variants produced by Canadian English native speakers and those produced by Iranian EFL l...

متن کامل

Comparison the efficiency of some different Zn source on quantitative and qualitative yield of Lemon verbena

One of the most effective micronutrients on the quantity and quality of medicinal herbs is zinc (Zn). The present study aimed to assess the impact of the foliar application of various Zn sources (0.2% w/v) including Zn chelated by amino acid (ZnAAC), Zn-sulfate and Zn-EDTA on qualitative and quantitative features of Lemon verbena (Lippia citriodora L.). In two harvests, the field experiment was...

متن کامل

A confidence-aware interval-based trust model

It is a common and useful task in a web of trust to evaluate the trust value between two nodes using intermediate nodes. This technique is widely used when the source node has no experience of direct interaction with the target node, or the direct trust is not reliable enough by itself. If trust is used to support decision-making, it is important to have not only an accurate estimate of trust, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004